Recap of session 5

Transforming data with dplyr
- select(): pick variables/columns by their names
- mutate(): create new variables/columns based on existing ones
- arrange(): reorder rows
- filter(): pick rows by their values
- summarize(): collapse many rows down to a single summary
- group_by(): perform operations at a group level

Recap of session 5

ALL of these functions take:

A dataset, and
Instructions on what to do with the dataset.

Recap of session 5

ALL of these functions take:

A dataset, and
Instructions on what to do with the dataset.

The dataset is either:

The first argument within the function’s parentheses, e.g.

select(df, day)

Recap of session 5

ALL of these functions take:

A dataset, and
Instructions on what to do with the dataset.

The dataset is either:

The first argument within the function’s parentheses, or
Passed to the function through a “pipe” %>%, e.g.

df %>% select(day)

Recap of session 5

ALL of these functions return a dataset!

You can do three things with this returned dataset:

Nothing, in which case it prints to screen.
Save it by assigning it to a variable.
Don’t save it, but pass it on to another function using a “pipe” %>%

`%>%` syntax with `dplyr`

Take the mtcars dataset, select just the wt and mpg columns, then select rows with mpg < 15

mtcars %>% 
    select(wt, mpg) %>% 
    filter(mpg < 15)

Agenda for today

Review of syntax in R
- Function syntax
- %>% syntax with dplyr
- + syntax with ggplot2
A deeper look at functions
tidyr package: gather() and separate()

Functions: R’s workhorse

A function is a named block of code which

Takes in 1 or more inputs from the user,
Performs a specific task, and
Returns an output to the user.

(Source: practicalli.github.io)

(Source: codehs.gitbooks.io)

We use functions in R all the time

We’ve already seen a number of functions in R! For example,

is.character("123")

## [1] TRUE

The function is.character takes the input given to it in the parentheses and returns TRUE or FALSE, depending on whether the input is of type character or not.

Others we’ve seen: str(), head(), rm(), ggplot(), select(), …

We can see what a function does by typing in ? followed by the function name in the R console.

?is.character

Function syntax

The most important syntax in R is the function call. All R syntax has function calls underlying it.

A function call consists of:

Function name
Parentheses, and
A list of arguments within the parentheses

function_name(<inputs to the function>,
              <arguments which change 
              how the function operates>)

Function example

function_name(<inputs to the function>,
              <arguments which change 
              how the function operates>)

x <- c(-5, -3, -1, 1, 3, NA)
mean(x)

## [1] NA

Function example

function_name(<inputs to the function>,
              <arguments which change 
              how the function operates>)

x <- c(-5, -3, -1, 1, 3, NA)
mean(x, na.rm = TRUE)

## [1] -1

Function calls read “inside out”

abs(x): If x is positive, return x. If x is negative, return x without the negative sign.

mean(abs(x), na.rm = TRUE)

## [1] 2.6

Function calls read “inside out”

abs(x): If x is positive, return x. If x is negative, return x without the negative sign.

mean(abs(x), na.rm = TRUE)

## [1] 2.6

The pipe operator `%>%`

%>% is implemented by the magrittr package
When the dplyr package is loaded, magrittr is loaded too
%>% is “syntactic sugar”: makes code easier to understand
Whatever is on the left of %>% becomes the first argument in the function on the right of %>%

library(magrittr)
x %>% abs() %>% mean(na.rm = TRUE)

## [1] 2.6

`%>%` syntax with `dplyr`

Take the mtcars dataset, select just the wt and mpg columns, then select rows with mpg < 15

mtcars %>% 
    select(wt, mpg) %>% 
    filter(mpg < 15)

`+` syntax with `ggplot2`

library(ggplot2)
ggplot(data = mtcars, mapping = aes(x = wt, y = hp)) +
    geom_point() +
    labs(title = "Horsepower vs. Weight", x = "Weight", 
         y = "Horsepower") +
    theme_classic()

Why `+` for `ggplot2` only?

(Source: Twitter)

A deeper look at functions

Question: How do we find out what a function does? What inputs does it accept, what does it output, etc…

A deeper look at functions

Question: How do we find out what a function does? What inputs does it accept, what does it output, etc…

First answer: Google it! Google “R <function name>”

A deeper look at functions

Question: How do we find out what a function does? What inputs does it accept, what does it output, etc…

First answer: Google it! Google “R <function name>”

A (probably) better answer: Documentation in R itself!

`sample()`: Description

`sample()`: Usage

What comes after the = sign: default value for that argument

`sample()`: Arguments

How does R know which arguments we are referring to?

sample(x = 1:10, size = 10)

##  [1]  8  7  2  3  1  5  9 10  4  6

How does R know which arguments we are referring to?

sample(x = 1:10, size = 10)

##  [1] 10  8  9  3  1  5  6  4  2  7

sample(1:10, 10, TRUE)

##  [1]  9  2  1 10 10  7 10  7  8  3

How does R know which arguments we are referring to?

sample(x = 1:10, size = 10)

##  [1]  6  2  5  9  4  1  8  7 10  3

sample(1:10, 10, TRUE)

##  [1] 10  6  4  1  7  3  5  4  6  1

sample(1:10, TRUE, size = 5)

## [1] 3 8 7 6 2

`tidyr::gather()`

E.g. dataset of no. of cases for each country

df

## # A tibble: 3 x 3
##   country     `1999` `2000`
##   <chr>        <dbl>  <dbl>
## 1 Afghanistan    745   2666
## 2 Brazil       37737  80488
## 3 China       212258 213766

`tidyr::gather()`